Code
#loading packages
library(DiagrammeR)Missing data occurs when there are missing values in a dataset. There are many reasons why this occurs. It can be intentional or unintentional and can be classified into the following three categories, otherwise known as missingness mechanisms (Mainzer et al. 2023):
Missing completely at random (MCAR) is the probability of missing data being completely independent of any other variables.
Missing at random (MAR) is the probability of missing data being related to the observed values.
Missing not at random (MNAR) is the probability of missing data being dependent on the missing and observed values.
Figure 1: Graphical Representation of Missingness Mechanisms (Schafer and Graham 2002)
(X are the completely observed variables. Y are the partly missing variables. Z is the component of the cause of missingness unrelated to X and Y. R is the missingness.)
Looking for patterns in the missing data can help us to determine which category they belong. These mechanisms are important in determining how to handle the missing data. MCAR would be the best case scenario but seldom occur. MAR and MNAR are more common.
The problem with ignoring any missing values is that it does not give a true representation of the dataset and can lead to bias when analyzing. This reduces the statistical power of the analysis (van_Ginkel et al. 2020). To enhance the quality of the research, the following should be followed: explicitly acknowledge missing data problems and the conditions under which they occur and employ principled methods to handle the missing data (Dong and Peng 2013).
There are three types of methods to deal with missing data, the likelihood and Bayesian method, weighting methods, or imputation methods (Cao et al. 2021). Missing data can also be handled by simply deleting.
Likelihood Bayesian method is when information from a previous predictive distribution is combined with evidence obtained in a sample to predict a value. It requires technical coding and advanced statistical knowledge.
The weighting method is a traditional approach when weights from available data are used to adjust for non-response in a survey. Inefficiency occurs when there are extreme weights or a need for many weights.
The imputation method is when an estimate from the original dataset is used to estimate the missing value. There are two types of imputation: single and multiple.
Listwise deletion is when the entire observation is removed from the dataset. Deleting missing data can lead to the loss of important information regarding your dataset and is therefore not recommended. In certain cases, when the amount of missing data is small and the type is MCAR, listwise deletion can be used. There usually won’t be bias but potentially important information may be lost.
T-tests and chi-square tests can be used to assess pairs of predictor variables to determine whether the groups’ means differ significantly. According to (van_Ginkel et al. 2020), if significant, the null hypothesis is rejected, therefore, indicating that the missing values are not randomly scattered throughout the data. This implies that the missing data is MAR or MNAR. Conversely, if nonsignificant, this implies that the data cannot be MAR. This does not eliminate the possibility that it is not MNAR–other information about the population is needed to determine this.
Whenever missing data is categorized as MAR or MNAR, listwise deletion would be wasteful, and the analysis biased. Alternate methods of dealing with the missing data is recommended: either pairwise deletion or imputation.
Pairwise deletion is when only the missing variable of an observation is removed. It allows more data to be analyzed than listwise deletion but limits the ability to make inferences of the total sample. For this reason, it is recommended to use imputation to properly deal with missing data.
Imputation is the preferred method to handle missing data. It consists of replacing missing data with an estimate obtained from the original, available data. After imputation, there will be a full dataset to analyze. To improve statistical power, the number of imputations created should be at least equal to the percent of missing data (5% equals 5 imputations, 10% equals 10 imputations, 20% equals 20 imputations, etc.) (Pedersen et al. 2017). According to (Wulff and Jeppesen 2017), 3-5 imputations are sufficient, and 10 are more than enough.
Single, or univariate, imputation is when only one estimate is used to replace the missing data. Methods of single imputation include using the mean, the last observation carried forward, and random imputation. The following is a brief explanation of each:
Using the mean to replace a missing value is a straight-forward process. The mean of the dataset is calculated, including the missing value. The mean is then multiplied by the number of observations in the study. Next, the known values are subtracted from the product, and this gives an estimate that can be used for any missing values. The problem with this method is that it reduces the variance which leads to a smaller confidence interval.
Last Observation Carried Forward (LOCF) is a technique of replacing a missing value in longitudinal studies with a previously observed value (the most recent value is carried forward) (Streiner 2008). The problem with this method is that it assumes that the previous observed value is perpetual when in reality that most likely is not the case.
Random imputation is a method of randomly drawing an observation and using that observation for any of the missing values. The problem with this method is that it introduces additional variability.
These single imputation methods are flawed. They often result in underestimation of standard errors or too small p-values (Dong and Peng 2013), which can cause bias in the analysis. Therefore, multiple imputation is the better method because it handles missing data better and provides less biased results.
Multiple, or multivariate, imputation is when various estimates are used to replace the missing data by creating multiple datasets from versions of the original dataset. It can be done by using a regression model, or a sequence of regression models, such as linear, logistic and Poison. A set of m plausible values are generated for each unobserved data point, resulting in M complete data sets (Dong and Peng 2013). The new values are randomly drawn from predictive distributions either through joint modeling (JM, which is not used much anymore) or fully conditional specification (FCS) (Wongkamthong and Akande 2023). It is then analyzed and the results are combined to obtain a single value for the missing data.
The purpose of multiple imputation is to create a pool of imputed data for analysis, but if the pooled results are lacking, then multiple imputation should not be done (Mainzer et al. 2023). Another reason not to use multiple imputation is if there are very few missing values; there may be no benefit in using it. Also worth noting is some statistical analyses software already have built-in features to deal with missing data.
Multiple imputation by chained methods, otherwise known as MICE, is the most common and preferred, method of multiple imputation (Wulff and Jeppesen 2017). It provides a more reliable way to analyze data with missing values. For this reason, this paper will focus on the methodology and application of the MICE process.
#loading packages
library(DiagrammeR)Figure 2: Flowchart of the MICE-process based on procedures proposed by Rubin (Wulff and Jeppesen 2017)
DiagrammeR::grViz("digraph {
# initiate graph
graph [layout = dot, rankdir = LR, label = 'The MICE-Process\n\n',labelloc = t, fontcolor = DarkSlateBlue, fontsize = 45]
# global node settings
node [shape = rectangle, style = filled, fillcolor = AliceBlue, fontcolor = DarkSlateBlue, fontsize = 35]
bgcolor = none
# label nodes
incomplete [label = 'Incomplete data set']
imputed1 [label = 'Imputed \n data set 1']
estimates1 [label = 'Estimates from \n analysis 1']
rubin [label = 'Rubin rules', shape = diamond]
combined [label = 'Combined results']
imputed2 [label = 'Imputed \n data set 2']
estimates2 [label = 'Estimates from \n analysis 2']
imputedm [label = 'Imputed \n data set m']
estimatesm [label = 'Estimates from \n anaalysis m']
# edge definitions with the node IDs
incomplete -> imputed1 [arrowhead = vee, color = DarkSlateBlue]
imputed1 -> estimates1 [arrowhead = vee, color = DarkSlateBlue]
estimates1 -> rubin [arrowhead = vee, color = DarkSlateBlue]
incomplete -> imputed2 [arrowhead = vee, color = DarkSlateBlue]
imputed2 -> estimates2 [arrowhead = vee, color = DarkSlateBlue]
estimates2-> rubin [arrowhead = vee, color = DarkSlateBlue]
incomplete -> imputedm [arrowhead = vee, color = DarkSlateBlue]
imputedm -> estimatesm [arrowhead = vee, color = DarkSlateBlue]
estimatesm -> rubin [arrowhead = vee, color = DarkSlateBlue]
rubin -> combined [arrowhead = vee, color = DarkSlateBlue]
}")*Rubin’s Rules: Average the estimates across m estimates. Calculate the standard errors and variance of m estimates. Combine using an adjustment term (1+1/m).
There are other methods of imputation worth noting and are briefly descrbied below.
Regression Imputation is based on a linear regression model. Missing data is randomly drawn from a conditional distribution when variables are continuous and from a logistic regression model when they are categorical (van_Ginkel et al. 2020).
Predictive Mean Matching is also based on a linear regression model. The approach is the same as regression imputation when it comes to categorical missing values but different for continuous variables. Instead of random draws from a conditional distribution, missing values are based on predicted values of the outcome variable (van_Ginkel et al. 2020).
Hot Deck (HD) imputation is when a missing value is replaced by an observed response of a similar unit, also known as the donor. It can be either random or deterministic (based on a metric or value) (Thongsri and Samart 2022). It does not rely on model fitting.
Stochastic Regression (SR) Imputation is an extension of regression imputation. The process is the same but a residual term from the normal distribution of the regression of the predictor outcome is added to the imputed value (Thongsri and Samart 2022). This maintains the variability of the data.
Random Forest (RF) Imputation is based on machine learning algorithms. Missing values are first replaced with the mean or mode of that particular variable and then the dataset is split into a training set and a prediction set (Thongsri and Samart 2022). The missing values are then replaced with predictions from these sets. This type of imputation can be used on continuous or categorical variables with complex interactions.
Multiple Imputation by Chained Equations (MICE)
In multiple imputation, m imputed values are created for each of the missing data and result in M complete datasets. For each of the M datasets, an estimate of \(\theta\) is acquired.
Combined estimator of \(\theta\) is given by:
\({\hat{\theta}}_{M}\)=\(\displaystyle \frac{1}{M}\)\(\sum_{m = 1}^{M} {\hat{\theta}}_{m}\)
The proposed variance estimator of \({\hat{\theta}}_{M}\) is given by:
\({\hat{\Phi}}_{M}\) = \({\overline{\phi}}_{M}\)+(1+\(\displaystyle \frac{1}{M}\))B\(_{M}\)
where \({\overline{\phi}}_{M}\) = \(\displaystyle \frac{1}{M}\)\(\sum_{m = 1}^{M}\)\({\hat{\phi}}_m\)
and B\(_{M}\) = \(\displaystyle \frac{1}{M-1}\)\(\sum_{m = 1}^{M}\)(\({\hat{\theta}}_{m}\)-\({\overline{\theta}}_{M}\))\(^{2}\)
The chained equation process has the following steps (Azur et al. 2011):
Using simple imputation, replace the missing data with this value, referred to as the “place holder”.
The “place holder” values for one variable are set back to missing.
The observed values from this variable (dependent variable) are regressed on the other variables (independent variables) in the model, using the same assumptions when performing linear, logistic, or Poison regression.
The missing values are replaced with predictions “m” from this newly created model.
Repeat Steps 2-4 for each variable that have missing values until all missing values have been replaced.
Repeat Steps 2-4, updating imputations each cycle for as many “m” cycles/imputations that are required.
# load data
credit = read.csv("credit_data.csv")
# load libraries
library(gtsummary)
library(dplyr, warn.conflicts=FALSE)
library(mice, warn.conflicts=FALSE)Credit score data
The credit.csv file is from the website of Dr. Lluís A. Belanche Muñoz, by way of a github repository of Dr. Gaston Sanchez. It contains data of 4,454 subjects and stores a combination of continuous, categorical and count values for 15 variables. Of the 15 variables, the “Status” variable contains binomial categorical values of “good” and “bad” to describe the kind of credit score each subject has. One data point is missing an outcome and was removed from the original data.
| Variable | Type | Description |
|---|---|---|
| X | Integer | Count variable indicating the number of subjects. |
| Status | Character | 2-level categorical variable indicating the status of the subject’s credit: good or bad. |
| Seniority | Integer | Count variable indicating the seniority a subject has accumulated over the course of their life. |
| Home | Character | 6-level categorical variable indicating the subject’s relationship to their residential address: rent, owner, parents, priv, other, or ignore. |
| Time | Integer | Count variable showing how many months has elapsed since the subject’s payment deadline without paying their debt full. |
| Age | Integer | Count variable indicating subject’s age (in years). |
| Marital | Character | 5-level categorical variable indicating the subject’s marital status: single, married, separated, divorced, or widow. |
| Records | Character | 2-level categorical variable indicating whether the subject has a credit history record: yes or no. |
| Job | Character | 4-level categorical variable indicating the type of job the subject has: fixed, freelance, partime, or others. |
| Expenses | Integer | Count variable indicating the amount of expenses (in USD) a subject has. |
| Income | Integer | Count variable indicating the amount of income (in thousands of USD) a subject earns annually. |
| Assets | Integer | Count variable indicating the amount of assets (in USD) a subject has. |
| Debt | Integer | Count variable indicating the amount of debt (in USD) a subject has. |
| Amount | Integer | Count variable indicating the amount of money (in USD) remaining in a subject’s bank account. |
| Price | Integer | Count variable indicating the amount of money a subject earns by the end of the month. |
credit %>%
tbl_summary(by = Status,
missing_text = "NA") %>%
add_p() %>%
add_n() %>%
add_overall %>%
modify_header(label ~ "**Variable**") %>%
modify_caption("**Summary of Credit Data**") %>%
bold_labels()| Variable | N | Overall, N = 4,4541 | bad, N = 1,2541 | good, N = 3,2001 | p-value2 |
|---|---|---|---|---|---|
| X | 4,454 | 2,228 (1,114, 3,341) | 2,222 (1,142, 3,366) | 2,232 (1,098, 3,326) | 0.3 |
| Seniority | 4,454 | 5 (2, 12) | 2 (1, 6) | 7 (2, 14) | <0.001 |
| Home | 4,448 | <0.001 | |||
| ignore | 20 (0.4%) | 9 (0.7%) | 11 (0.3%) | ||
| other | 319 (7.2%) | 146 (12%) | 173 (5.4%) | ||
| owner | 2,107 (47%) | 390 (31%) | 1,717 (54%) | ||
| parents | 783 (18%) | 233 (19%) | 550 (17%) | ||
| priv | 246 (5.5%) | 84 (6.7%) | 162 (5.1%) | ||
| rent | 973 (22%) | 388 (31%) | 585 (18%) | ||
| NA | 6 | 4 | 2 | ||
| Time | 4,454 | 48 (36, 60) | 48 (36, 60) | 48 (36, 60) | <0.001 |
| Age | 4,454 | 36 (28, 45) | 34 (27, 42) | 36 (28, 46) | <0.001 |
| Marital | 4,453 | <0.001 | |||
| divorced | 38 (0.9%) | 14 (1.1%) | 24 (0.8%) | ||
| married | 3,241 (73%) | 829 (66%) | 2,412 (75%) | ||
| separated | 130 (2.9%) | 64 (5.1%) | 66 (2.1%) | ||
| single | 977 (22%) | 328 (26%) | 649 (20%) | ||
| widow | 67 (1.5%) | 19 (1.5%) | 48 (1.5%) | ||
| NA | 1 | 0 | 1 | ||
| Records | 4,454 | 773 (17%) | 429 (34%) | 344 (11%) | <0.001 |
| Job | 4,452 | <0.001 | |||
| fixed | 2,805 (63%) | 580 (46%) | 2,225 (70%) | ||
| freelance | 1,024 (23%) | 333 (27%) | 691 (22%) | ||
| others | 171 (3.8%) | 68 (5.4%) | 103 (3.2%) | ||
| partime | 452 (10%) | 271 (22%) | 181 (5.7%) | ||
| NA | 2 | 2 | 0 | ||
| Expenses | 4,454 | 51 (35, 72) | 49 (35, 75) | 52 (35, 68) | 0.8 |
| Income | 4,073 | 125 (90, 170) | 100 (74, 148) | 130 (100, 178) | <0.001 |
| NA | 381 | 217 | 164 | ||
| Assets | 4,407 | 3,000 (0, 6,000) | 0 (0, 4,000) | 4,000 (0, 7,000) | <0.001 |
| NA | 47 | 20 | 27 | ||
| Debt | 4,436 | 0 (0, 0) | 0 (0, 0) | 0 (0, 0) | 0.3 |
| NA | 18 | 13 | 5 | ||
| Amount | 4,454 | 1,000 (700, 1,300) | 1,100 (800, 1,415) | 1,000 (700, 1,250) | <0.001 |
| Price | 4,454 | 1,400 (1,117, 1,692) | 1,423 (1,062, 1,728) | 1,400 (1,134, 1,678) | >0.9 |
| 1 Median (IQR); n (%) | |||||
| 2 Wilcoxon rank sum test; Pearson's Chi-squared test | |||||
First, we evaluate the dataset for missing values. As indicated in the table, the data does contain NA/missing values. We can create a table that shows each variable and how many missing values they have:
# Shows which variables have missing values and how many
colSums(is.na(credit)) X Status Seniority Home Time Age Marital Records
0 0 0 6 0 0 1 0
Job Expenses Income Assets Debt Amount Price
2 0 381 47 18 0 0
We now must analyze the data to see how we intend to handle the missing values. In order to do this, we need to create a new dataset, called new_credit, that deletes the missing data. We want to perserve the original dataset so we can implement the method we intend to use to address the missing values. We can then generate a count of rows to determine how many values were deleted in total.
# Creates a new dataset excluding missing values
new_credit = na.omit(credit)
# Number of rows of new dataset
nrow(new_credit)[1] 4039
We started out with 4,454 rows and our new dataset has 4,039. 415 rows were deleted due to the missing data. To run regression, we would be throwing away 9.3% of our data, because of missingness. Instead, we can use multiple imputation to impute the missing values so that we don’t have to discard such valuable information.
Using the MICE (Multivariate Imputation by Chained Equations) package in R, a statistical programming software, we will create multiple datasets with imputed values for the missing values. Because our dataset contains just under 10% of missing data, we will generate 10 imputations, or 10 new datasets. The MICE package seamlessly does this by creating plausable values from other columns and places them into the intersections of rows and columns with missing data.
First step is to check the missingness by looking for patterns in the original dataset using the md.pattern() function:
credit <- credit[-c(1)]
md.pattern(credit, rotate.names = TRUE) Status Seniority Time Age Records Expenses Amount Price Marital Job Home
4039 1 1 1 1 1 1 1 1 1 1 1
366 1 1 1 1 1 1 1 1 1 1 1
22 1 1 1 1 1 1 1 1 1 1 1
7 1 1 1 1 1 1 1 1 1 1 1
8 1 1 1 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1 1 1 0
2 1 1 1 1 1 1 1 1 1 1 0
1 1 1 1 1 1 1 1 1 1 0 1
1 1 1 1 1 1 1 1 1 1 0 0
1 1 1 1 1 1 1 1 1 0 1 1
0 0 0 0 0 0 0 0 1 2 6
Debt Assets Income
4039 1 1 1 0
366 1 1 0 1
22 1 0 1 1
7 1 0 0 2
8 0 0 1 2
4 0 0 0 3
3 0 0 1 3
2 0 0 0 4
1 1 1 0 2
1 0 0 0 5
1 1 1 1 1
18 47 381 455
Blue is observed values and red is missing values. There are 11 patterns.
In order to perform multiple imputation on categorical data, all string variables must be converted to factors using the as.factor() function (van_Buuren 2011):
credit$Status = as.factor(credit$Status)
credit$Home = as.factor(credit$Home)
credit$Marital = as.factor(credit$Marital)
credit$Records = as.factor(credit$Records)
credit$Job = as.factor(credit$Job)Using the mice() function, 10 multiple imputations for the missing values will be generated. The default is 5, so you must set m = to the number of imputations that you desire. Since the data type of the variables in the dataset are of both numerical and categorical nature (with 2 and more levels), the defaultMethod argument will contain pmm: predictive mean matching (numeric data); logreg: logistic regression imputation (binary data, factor with 2 levels); polyreg: polytomous regression imputation for unordered categorical data (factor > 2 levels); polr: proportional odds model for (ordered, > 2 levels). The set.seed will be given the value 1337 (any number can be used here) to retrieve the same results each time the multiple imputation is performed.
Multiple_Imputation = mice(data = credit, maxit = 10, m = 10, defaultMethod = c("pmm", "logreg", "polyreg", "polr"), set.seed = 1337)
iter imp variable
1 1 Home Marital Job Income Assets Debt
1 2 Home Marital Job Income Assets Debt
1 3 Home Marital Job Income Assets Debt
1 4 Home Marital Job Income Assets Debt
1 5 Home Marital Job Income Assets Debt
1 6 Home Marital Job Income Assets Debt
1 7 Home Marital Job Income Assets Debt
1 8 Home Marital Job Income Assets Debt
1 9 Home Marital Job Income Assets Debt
1 10 Home Marital Job Income Assets Debt
2 1 Home Marital Job Income Assets Debt
2 2 Home Marital Job Income Assets Debt
2 3 Home Marital Job Income Assets Debt
2 4 Home Marital Job Income Assets Debt
2 5 Home Marital Job Income Assets Debt
2 6 Home Marital Job Income Assets Debt
2 7 Home Marital Job Income Assets Debt
2 8 Home Marital Job Income Assets Debt
2 9 Home Marital Job Income Assets Debt
2 10 Home Marital Job Income Assets Debt
3 1 Home Marital Job Income Assets Debt
3 2 Home Marital Job Income Assets Debt
3 3 Home Marital Job Income Assets Debt
3 4 Home Marital Job Income Assets Debt
3 5 Home Marital Job Income Assets Debt
3 6 Home Marital Job Income Assets Debt
3 7 Home Marital Job Income Assets Debt
3 8 Home Marital Job Income Assets Debt
3 9 Home Marital Job Income Assets Debt
3 10 Home Marital Job Income Assets Debt
4 1 Home Marital Job Income Assets Debt
4 2 Home Marital Job Income Assets Debt
4 3 Home Marital Job Income Assets Debt
4 4 Home Marital Job Income Assets Debt
4 5 Home Marital Job Income Assets Debt
4 6 Home Marital Job Income Assets Debt
4 7 Home Marital Job Income Assets Debt
4 8 Home Marital Job Income Assets Debt
4 9 Home Marital Job Income Assets Debt
4 10 Home Marital Job Income Assets Debt
5 1 Home Marital Job Income Assets Debt
5 2 Home Marital Job Income Assets Debt
5 3 Home Marital Job Income Assets Debt
5 4 Home Marital Job Income Assets Debt
5 5 Home Marital Job Income Assets Debt
5 6 Home Marital Job Income Assets Debt
5 7 Home Marital Job Income Assets Debt
5 8 Home Marital Job Income Assets Debt
5 9 Home Marital Job Income Assets Debt
5 10 Home Marital Job Income Assets Debt
6 1 Home Marital Job Income Assets Debt
6 2 Home Marital Job Income Assets Debt
6 3 Home Marital Job Income Assets Debt
6 4 Home Marital Job Income Assets Debt
6 5 Home Marital Job Income Assets Debt
6 6 Home Marital Job Income Assets Debt
6 7 Home Marital Job Income Assets Debt
6 8 Home Marital Job Income Assets Debt
6 9 Home Marital Job Income Assets Debt
6 10 Home Marital Job Income Assets Debt
7 1 Home Marital Job Income Assets Debt
7 2 Home Marital Job Income Assets Debt
7 3 Home Marital Job Income Assets Debt
7 4 Home Marital Job Income Assets Debt
7 5 Home Marital Job Income Assets Debt
7 6 Home Marital Job Income Assets Debt
7 7 Home Marital Job Income Assets Debt
7 8 Home Marital Job Income Assets Debt
7 9 Home Marital Job Income Assets Debt
7 10 Home Marital Job Income Assets Debt
8 1 Home Marital Job Income Assets Debt
8 2 Home Marital Job Income Assets Debt
8 3 Home Marital Job Income Assets Debt
8 4 Home Marital Job Income Assets Debt
8 5 Home Marital Job Income Assets Debt
8 6 Home Marital Job Income Assets Debt
8 7 Home Marital Job Income Assets Debt
8 8 Home Marital Job Income Assets Debt
8 9 Home Marital Job Income Assets Debt
8 10 Home Marital Job Income Assets Debt
9 1 Home Marital Job Income Assets Debt
9 2 Home Marital Job Income Assets Debt
9 3 Home Marital Job Income Assets Debt
9 4 Home Marital Job Income Assets Debt
9 5 Home Marital Job Income Assets Debt
9 6 Home Marital Job Income Assets Debt
9 7 Home Marital Job Income Assets Debt
9 8 Home Marital Job Income Assets Debt
9 9 Home Marital Job Income Assets Debt
9 10 Home Marital Job Income Assets Debt
10 1 Home Marital Job Income Assets Debt
10 2 Home Marital Job Income Assets Debt
10 3 Home Marital Job Income Assets Debt
10 4 Home Marital Job Income Assets Debt
10 5 Home Marital Job Income Assets Debt
10 6 Home Marital Job Income Assets Debt
10 7 Home Marital Job Income Assets Debt
10 8 Home Marital Job Income Assets Debt
10 9 Home Marital Job Income Assets Debt
10 10 Home Marital Job Income Assets Debt
The following R code will show the imputed values. Columns are imputations, rows are observations.
head(Multiple_Imputation$imp, 10)$Status
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Seniority
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Home
1 2 3 4 5 6 7 8 9
30 owner parents rent other owner parents rent other rent
240 parents owner priv owner rent parents parents parents parents
1060 parents parents parents priv parents owner parents other parents
1677 owner owner owner owner parents owner other owner owner
2389 rent rent parents owner other other parents rent priv
2996 owner owner rent owner parents parents parents owner owner
10
30 rent
240 rent
1060 rent
1677 rent
2389 rent
2996 owner
$Time
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Age
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Marital
1 2 3 4 5 6 7 8 9
3319 married married married married married married single married married
10
3319 married
$Records
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Job
1 2 3 4 5 6 7 8
30 freelance freelance fixed fixed partime freelance fixed fixed
912 fixed partime fixed partime freelance freelance fixed freelance
9 10
30 freelance partime
912 partime partime
$Expenses
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Income
1 2 3 4 5 6 7 8 9 10
30 71 120 92 137 151 85 130 132 245 120
114 117 148 62 89 140 65 204 98 155 148
144 151 120 959 250 250 120 230 189 100 254
153 130 99 164 95 117 176 55 156 155 100
158 116 120 265 108 250 210 85 205 130 270
177 178 240 180 230 241 800 230 183 230 241
195 120 310 227 160 112 176 300 103 163 300
206 170 116 100 240 264 223 45 63 154 285
241 80 300 160 60 105 122 60 150 168 100
242 139 320 240 164 100 133 145 121 65 187
278 135 80 80 225 55 150 73 410 199 114
318 71 151 131 60 80 76 117 60 65 120
330 296 200 176 136 130 160 120 147 150 150
333 157 176 113 80 240 189 39 200 200 93
335 234 150 105 117 126 100 210 167 65 95
356 85 71 100 112 254 130 100 52 80 132
360 165 124 170 113 59 88 108 155 71 130
394 350 500 500 150 150 500 500 350 491 491
404 150 85 105 88 107 100 120 142 185 118
422 92 205 108 145 169 205 136 160 70 150
439 260 400 125 152 86 110 115 220 155 92
444 250 190 111 115 67 196 100 138 80 225
462 173 160 242 54 60 210 122 150 102 222
469 112 145 227 178 100 145 176 92 100 112
479 135 148 77 145 145 19 80 105 147 180
481 400 190 138 115 213 116 182 72 131 137
483 154 95 51 198 85 70 79 108 144 95
485 159 80 110 150 142 145 100 170 189 125
496 93 45 146 57 86 80 82 65 50 197
498 78 176 179 148 143 85 75 108 176 100
505 60 66 60 55 218 80 53 120 198 144
567 104 76 150 148 140 120 76 148 122 148
572 158 150 175 120 127 66 76 70 85 95
582 70 56 46 46 75 51 85 70 85 49
648 120 143 430 171 180 120 240 152 144 500
653 75 90 97 120 128 95 200 55 106 110
667 178 183 245 416 241 241 416 416 245 245
675 208 300 68 140 200 150 107 251 180 154
678 130 125 100 101 88 125 76 159 122 135
699 40 123 92 128 50 110 95 157 138 90
708 150 150 65 115 41 179 95 203 160 190
714 160 85 155 105 125 85 70 102 125 194
716 63 86 115 106 90 49 95 106 61 182
733 76 225 102 100 125 105 117 110 115 110
734 160 83 124 194 101 125 81 121 107 128
746 300 105 141 30 81 192 98 105 100 178
777 195 74 94 50 96 75 73 106 57 128
781 61 156 102 241 85 140 70 240 105 75
785 100 250 69 90 160 73 220 191 250 171
804 90 125 100 218 95 224 115 142 126 178
824 60 51 25 136 58 73 57 100 130 111
865 230 136 203 100 154 166 164 126 100 164
866 137 38 90 90 134 159 110 113 120 88
880 145 102 270 300 97 180 100 250 95 214
889 230 250 73 314 310 175 200 200 106 200
906 165 125 133 140 55 166 260 191 92 315
912 60 85 70 91 115 170 93 185 117 81
942 90 60 56 126 158 90 78 73 58 55
952 178 180 180 24 188 91 110 200 40 450
989 119 85 124 106 140 95 137 104 320 129
1001 70 40 37 100 63 85 128 92 70 85
1017 150 160 115 318 159 122 300 172 107 95
1039 160 95 86 110 77 73 87 120 95 128
1044 115 27 60 65 62 92 139 33 67 150
1069 106 141 120 265 144 201 198 260 54 100
1100 67 92 92 219 60 82 170 200 82 69
1111 90 75 120 77 123 75 78 63 130 86
1125 315 175 98 115 90 168 133 65 214 145
1168 60 189 180 250 205 200 298 90 245 66
1208 150 105 66 250 289 79 70 75 185 45
1226 240 133 150 380 161 60 174 148 200 175
1250 140 112 37 80 180 50 100 106 400 33
1257 255 98 207 195 107 140 98 245 114 245
1276 142 198 19 178 213 150 180 200 160 183
1281 67 120 80 79 130 72 75 73 85 51
1289 38 141 120 75 300 103 71 75 25 110
1297 90 97 245 104 187 96 122 101 118 95
1307 182 81 52 120 130 120 157 160 205 470
1314 110 110 178 118 400 70 146 80 90 325
1335 132 197 160 63 229 138 227 112 105 90
1364 172 120 166 121 300 250 275 199 180 96
1365 204 129 222 350 45 50 150 87 251 250
1366 75 65 100 193 202 67 400 126 193 100
1392 150 491 150 350 491 500 150 491 491 350
1421 177 120 97 130 108 43 160 160 107 293
1427 400 167 244 121 61 173 40 142 164 140
1433 125 181 102 130 97 100 146 171 65 166
1436 191 120 52 110 97 101 172 129 120 90
1437 85 210 81 164 80 120 137 125 101 130
1441 57 67 139 110 78 46 102 130 134 94
1456 70 164 147 80 107 67 55 100 200 80
1473 250 356 120 207 183 68 60 99 315 100
1509 143 300 150 70 157 126 25 140 125 60
1513 161 128 95 80 60 63 49 128 128 78
1530 219 85 65 70 40 58 55 79 60 65
1535 100 100 129 68 130 150 111 190 107 75
1536 400 100 195 315 260 144 189 189 160 100
1544 95 75 81 95 76 225 130 147 95 75
1549 102 120 250 185 120 137 128 92 125 106
1564 130 92 232 292 90 225 90 182 100 125
1580 198 41 92 135 68 142 80 115 99 60
1583 70 178 82 148 122 250 110 150 60 75
1598 80 150 160 198 102 125 120 202 55 46
1599 193 71 125 140 50 101 120 47 55 115
1619 115 232 150 131 182 203 105 230 70 164
1629 141 115 130 124 163 83 140 67 80 80
1648 130 128 80 67 60 70 152 130 60 92
1662 117 117 70 60 150 253 125 240 100 77
1677 83 189 193 240 300 110 131 147 131 70
1685 100 159 154 104 128 150 212 63 400 150
1722 54 160 198 62 200 150 193 63 45 207
1724 160 143 263 75 140 80 166 108 112 178
1733 100 143 160 247 290 195 60 214 274 275
1741 150 137 70 101 73 127 80 62 128 50
1745 135 114 39 67 500 246 39 160 60 268
1753 60 150 92 92 50 182 169 150 70 63
1762 70 127 78 110 65 93 125 178 154 110
1766 428 150 145 108 50 165 315 118 183 250
1771 195 110 140 101 125 250 330 260 125 162
1798 144 130 135 190 187 200 100 110 144 110
1802 500 150 150 150 150 491 500 500 491 500
1803 120 100 247 130 192 146 145 85 82 81
1807 160 95 80 120 107 215 160 163 81 76
1811 84 107 105 150 208 150 88 117 95 70
1844 120 180 230 158 220 150 173 112 183 254
1851 152 200 350 200 250 125 250 288 230 191
1852 195 91 120 101 54 93 53 117 96 73
1870 74 100 19 475 500 112 165 120 78 162
1872 157 312 52 115 107 127 113 122 150 110
1882 130 130 55 128 55 75 60 138 92 43
1883 226 67 345 160 105 218 188 137 310 208
1893 500 500 500 150 200 350 500 491 150 905
1898 225 95 165 112 140 129 230 123 56 89
1903 95 67 90 70 67 42 105 101 56 35
1907 250 223 290 133 83 96 185 424 250 173
1920 55 175 60 134 92 60 135 130 222 80
1936 200 190 394 159 296 125 260 211 155 60
1946 140 210 57 190 105 133 80 170 103 170
1948 85 85 150 116 110 150 147 135 70 75
1962 64 100 120 113 115 122 105 120 88 164
1963 394 195 535 99 160 110 274 105 83 130
1965 150 83 148 64 76 110 150 119 125 180
1970 300 100 230 115 230 459 250 315 300 180
1972 500 491 500 500 491 500 500 500 500 491
1977 117 167 199 190 60 200 208 180 125 92
1979 145 75 122 256 150 100 125 107 172 214
1980 92 81 84 86 56 40 90 52 112 85
1984 142 130 148 250 320 65 260 274 251 190
2006 160 140 140 247 225 250 145 73 212 183
2016 93 200 150 150 211 175 120 130 140 195
2022 75 142 183 148 247 100 110 154 150 198
2025 128 155 356 400 157 371 300 170 240 129
2042 60 120 145 220 120 128 324 315 200 98
2043 146 115 90 95 121 253 79 150 94 107
2076 98 116 105 150 85 124 70 106 158 283
2077 150 90 115 65 233 120 79 38 100 158
2083 191 120 55 265 136 250 118 55 126 60
2156 211 155 233 40 25 190 160 90 223 163
2157 122 132 47 126 200 128 150 74 91 64
2186 128 48 79 82 147 100 134 114 235 80
2197 150 260 142 184 102 100 98 139 92 90
2205 55 130 236 210 122 93 165 51 428 158
2218 187 233 125 100 90 68 110 79 208 184
2227 65 85 82 110 58 42 125 48 128 85
2233 130 90 92 55 225 122 139 67 150 110
2240 179 113 61 71 75 70 205 197 355 71
2257 138 77 100 93 138 154 75 118 133 115
2280 144 200 350 166 700 152 166 700 137 350
2291 113 210 85 152 125 230 150 136 160 220
2297 127 77 145 125 150 122 150 84 82 120
2304 85 135 33 121 45 165 49 53 48 49
2310 100 125 189 150 106 115 50 271 160 320
2323 95 26 70 90 70 71 140 126 101 70
2331 905 500 491 200 905 500 150 500 905 491
2337 65 115 77 137 70 85 90 129 117 335
2349 303 160 384 110 214 106 208 189 46 214
2365 61 68 100 138 122 80 240 310 500 250
2369 324 175 120 150 107 135 106 95 163 135
2387 40 198 120 101 71 139 106 73 60 128
2396 150 162 100 91 208 53 113 237 128 33
2399 144 212 245 170 84 235 138 160 100 161
2402 79 200 245 158 300 120 109 294 290 172
2404 149 150 150 176 195 140 175 127 160 60
2437 250 300 195 115 275 144 315 459 120 120
2445 187 345 131 150 125 110 164 100 207 120
2446 130 95 135 104 150 60 333 163 300 100
2453 60 131 95 90 183 113 120 103 220 81
2460 128 92 135 221 60 30 66 128 70 60
2467 58 70 115 85 50 40 120 72 70 70
2473 148 105 95 215 124 97 100 180 51 59
2490 85 165 35 72 115 67 102 75 35 85
2495 53 70 72 130 56 95 110 63 126 78
2505 92 142 6 150 78 120 107 300 264 107
2566 110 128 100 71 80 130 160 63 91 77
2572 85 117 254 55 107 115 106 50 109 90
2578 208 85 103 131 110 131 164 87 83 218
2584 62 125 125 67 111 133 53 178 82 113
2596 129 102 66 85 113 91 130 87 80 162
2605 130 115 90 80 60 80 58 70 121 209
2614 140 300 120 138 102 42 175 54 242 180
2624 174 55 110 105 142 110 116 148 43 39
2625 167 100 77 161 110 142 173 181 100 124
2631 63 149 144 147 150 175 155 150 190 324
2632 118 86 92 145 210 70 285 220 168 123
2651 92 105 95 125 108 151 150 161 64 150
2652 50 86 215 88 78 267 132 140 140 116
2653 231 219 101 199 170 80 100 126 200 216
2668 150 150 92 68 112 130 133 100 250 176
2676 211 166 145 240 85 200 240 136 100 125
2681 100 110 114 128 147 121 163 80 107 113
2683 80 79 120 151 25 87 150 81 164 80
2695 182 210 200 100 117 196 120 114 180 150
2696 65 123 67 125 60 60 48 70 83 75
2707 200 129 185 90 125 151 81 190 122 89
2720 110 210 107 121 210 260 170 166 155 274
2723 106 90 135 82 92 130 90 107 70 86
2725 100 250 400 275 250 150 65 250 314 313
2730 120 180 201 65 101 125 68 100 168 92
2769 80 79 85 129 115 72 51 107 90 88
2780 72 63 63 70 96 42 126 32 72 63
2781 161 138 113 320 146 210 145 160 144 70
2802 135 92 115 237 130 180 90 105 169 107
2805 90 75 175 390 125 100 260 350 426 124
2806 160 73 160 125 55 134 134 70 107 100
2807 160 150 120 202 50 89 57 122 73 129
2810 117 180 110 81 149 62 116 306 176 125
2813 166 115 120 143 80 200 187 120 100 150
2815 111 75 140 87 137 115 67 155 90 80
2825 384 606 100 130 155 168 380 85 380 82
2854 96 300 107 300 150 300 222 140 857 130
2869 160 251 130 133 75 42 154 90 105 212
2882 60 87 106 147 74 47 80 75 130 69
2884 120 54 80 133 144 145 64 58 80 215
2893 210 164 42 170 211 135 56 125 64 135
2915 50 70 185 131 120 115 122 200 96 108
2927 83 130 185 78 89 75 147 180 120 105
2935 92 158 75 92 75 70 192 130 100 40
2936 135 80 466 112 318 70 104 120 289 213
2939 131 153 71 232 230 122 100 150 50 120
2951 300 240 178 100 230 183 905 416 150 241
2954 125 137 300 250 200 300 166 250 125 300
2969 140 161 190 95 107 70 166 222 121 187
2971 400 53 114 340 470 54 81 340 25 135
2979 254 120 180 120 180 100 100 120 142 210
2983 72 90 126 117 85 130 100 138 87 170
2991 90 120 113 105 150 187 90 173 76 49
2996 122 150 134 140 200 124 148 171 221 200
2999 83 140 300 120 113 345 110 93 145 138
3008 208 350 50 700 152 200 180 700 700 250
3014 170 98 100 178 93 64 156 185 410 90
3021 96 126 127 199 192 186 76 270 125 128
3026 125 140 107 155 119 165 39 130 45 440
3031 120 65 70 78 83 85 105 92 22 52
3038 42 49 49 67 53 121 67 121 121 67
3040 266 175 130 175 130 371 190 158 857 300
3069 251 200 95 121 235 426 300 78 170 142
3080 150 100 105 283 120 129 60 180 73 65
3096 247 89 120 90 130 131 128 129 181 46
3104 74 170 158 65 85 181 110 120 163 80
3106 352 203 136 146 200 190 246 72 274 120
3110 73 215 192 245 294 254 271 180 464 100
3121 300 294 67 390 254 310 90 359 90 233
3123 130 538 155 110 233 300 42 120 91 244
3139 175 321 230 260 300 275 350 189 250 250
3167 120 90 142 185 189 133 75 79 80 196
3170 150 234 148 120 80 116 90 107 126 100
3183 64 121 181 132 86 140 222 106 61 182
3185 200 106 140 160 205 100 155 318 173 70
3187 90 178 125 95 50 160 94 94 150 160
3203 133 130 60 97 78 151 115 135 142 228
3218 73 150 160 199 188 164 103 107 132 85
3222 154 90 155 157 160 160 200 130 333 100
3229 91 160 185 260 96 250 155 125 122 70
3233 118 68 320 290 230 250 120 400 120 55
3237 250 118 85 146 143 141 105 113 160 88
3245 500 185 68 100 112 112 113 140 164 100
3252 70 197 120 102 65 91 93 165 100 66
3266 180 110 101 161 107 110 60 210 147 130
3286 120 100 152 60 125 141 55 100 109 100
3288 217 225 200 110 114 123 229 110 63 119
3304 500 491 150 491 241 500 905 491 905 905
3310 100 65 100 75 500 200 90 142 200 350
3316 200 150 105 96 153 150 115 75 93 156
3325 172 60 92 205 104 90 134 117 81 59
3336 275 225 285 350 125 200 198 95 181 100
3338 183 416 416 200 245 178 416 241 800 230
3345 90 160 75 211 189 110 170 146 211 157
3352 143 163 75 211 318 197 200 125 143 80
3365 182 73 297 81 130 54 55 75 132 79
3382 139 225 115 118 225 143 150 140 144 110
3433 120 50 90 60 139 135 81 65 55 80
3439 221 87 137 59 92 120 40 221 57 100
3451 100 73 130 72 70 139 78 66 78 70
3452 95 75 100 161 121 52 75 90 47 140
3454 72 110 57 89 150 131 350 130 71 83
3456 178 95 120 217 67 120 90 106 117 105
3461 164 140 125 140 150 110 189 135 160 125
3462 135 280 125 265 148 164 150 129 163 216
3473 200 141 125 110 112 110 80 90 60 198
3477 131 92 137 160 247 142 184 77 150 115
3478 121 240 400 69 146 150 140 100 426 140
3494 75 70 116 59 58 116 200 87 218 52
3513 64 150 80 235 115 108 64 120 75 149
3523 233 93 155 120 69 180 171 131 140 125
3525 260 80 196 205 93 180 163 290 80 82
3534 95 175 125 115 140 100 250 222 204 145
3556 139 213 125 260 195 384 478 300 195 186
3641 65 320 240 325 176 260 112 125 208 160
3645 235 140 111 203 63 110 105 85 104 129
3657 93 166 148 56 90 60 80 92 79 63
3674 60 75 134 72 117 70 195 141 57 85
3679 183 114 89 125 200 114 245 110 100 50
3691 92 200 130 100 114 120 210 112 142 145
3704 122 80 124 31 98 145 114 130 120 199
3709 250 67 167 52 217 154 197 130 300 200
3714 77 70 70 75 88 62 90 136 53 335
3717 88 65 140 95 111 115 60 76 103 63
3730 59 93 110 118 180 51 95 150 75 120
3740 227 128 200 120 90 75 100 124 141 110
3763 125 110 340 131 79 70 17 92 125 106
3768 58 177 175 265 105 155 116 100 200 108
3773 300 350 300 383 464 320 464 125 321 200
3794 125 130 190 110 61 190 237 197 54 180
3800 72 105 101 149 63 204 75 74 165 88
3823 140 145 136 90 199 115 100 42 87 64
3825 53 67 121 121 121 53 49 33 33 49
3850 122 120 122 90 63 107 415 120 80 65
3855 63 134 250 208 140 340 200 125 174 225
3857 122 30 70 119 100 50 45 119 61 72
3858 60 70 128 66 110 110 198 93 73 90
3882 83 140 175 96 120 84 73 170 91 350
3887 140 93 208 95 290 100 130 135 102 150
3892 220 355 195 71 230 201 208 45 210 78
3902 114 63 189 255 84 150 290 137 152 110
3914 82 74 38 67 99 92 87 150 78 130
3928 275 189 430 350 195 180 94 160 300 350
3932 108 100 100 137 143 125 150 100 97 110
3945 205 114 139 85 247 154 125 116 104 150
3946 130 247 122 314 109 166 205 144 191 144
3947 70 49 70 182 70 120 70 72 70 82
3951 75 194 179 180 218 78 150 120 108 130
3955 50 110 135 60 50 202 75 88 73 75
3966 178 115 84 98 109 87 90 111 175 170
3992 102 130 150 185 120 130 100 67 60 112
4003 81 150 196 110 85 106 173 132 200 159
4023 178 79 160 211 230 205 700 146 228 150
4036 65 90 113 66 154 57 25 146 80 79
4049 121 56 49 53 121 75 88 49 33 85
4064 95 120 84 128 85 139 117 114 110 117
4069 142 147 157 115 147 45 250 75 195 150
4076 75 53 121 49 49 162 33 162 56 53
4082 450 107 113 157 60 140 100 190 90 108
4085 45 240 114 300 110 150 45 371 126 115
4096 145 80 125 70 90 133 143 212 96 165
4119 160 156 139 115 110 57 30 210 86 55
4159 150 208 135 60 119 98 349 122 75 182
4168 87 141 100 110 200 140 100 74 92 65
4173 90 105 75 40 70 128 221 93 82 126
4181 200 165 53 109 80 120 210 170 83 100
4191 80 213 124 150 94 100 110 141 200 179
4198 250 126 320 300 100 275 390 360 200 411
4199 54 77 112 156 98 318 180 180 110 173
4222 130 72 90 60 215 63 92 56 70 78
4223 158 70 93 64 42 65 80 219 167 138
4237 111 155 256 123 69 90 122 62 67 130
4246 142 90 68 306 110 90 60 55 110 92
4247 60 98 80 72 78 96 144 70 90 66
4256 150 50 300 250 208 300 250 87 146 68
4281 60 90 65 335 86 66 105 115 60 63
4295 75 62 124 194 45 88 105 134 100 106
4333 63 73 139 78 169 120 130 84 72 135
4349 200 196 115 240 426 157 169 155 135 141
4368 130 108 105 117 283 101 27 108 80 80
4373 75 175 65 70 101 85 188 85 172 160
4398 135 50 92 325 146 200 130 70 160 250
4411 100 300 160 245 315 120 62 92 250 182
4420 150 500 491 491 500 491 500 150 500 500
4433 114 145 140 100 300 192 130 100 155 92
4436 92 160 185 60 100 60 174 100 125 117
4440 150 132 280 149 95 40 238 120 63 98
4441 290 131 161 183 296 183 125 133 214 230
We can check the quality of the imputations by running a strip plot, which is a single axis scatter plot. It will show the distribution of each variable per imputed data set. We want the imputations to be values that could have been observed had the data not been missing.
par(mfrow=c(7,2))
stripplot(Multiple_Imputation, Status, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Seniority, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Home, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Time, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Age, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Marital, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Records, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Job, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Expenses, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Income, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Assets, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Debt, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Amount, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Price, pch = 19, xlab = "Imputation number")Next, we will pool the results of the complete dataset with the imputed dataset to arrive at estimates that will properly account for the missing data. We fit the complete model with the with() function and display the summary of the pooled results. It will give us the estimate, standard error, test statistic, degrees of freedom, and the p-value for each variable.
# fit complete-data model
fit <- with(Multiple_Imputation, glm(Status ~ Seniority + Home + Time + Age + Marital + Records + Job + Expenses + Income + Assets + Debt + Amount + Price, family = binomial))
# pool and summarize the results
summary(pool(fit)) term estimate std.error statistic df
1 (Intercept) 1.000072e+00 7.333866e-01 1.36363518 4086.1682
2 Seniority 8.307737e-02 7.454796e-03 11.14415122 4392.3238
3 Homeother 6.182192e-02 5.726574e-01 0.10795621 4373.7006
4 Homeowner 1.153055e+00 5.581845e-01 2.06572343 4411.1415
5 Homeparents 9.338966e-01 5.670622e-01 1.64690319 4381.7089
6 Homepriv 4.280971e-01 5.756352e-01 0.74369521 4410.1427
7 Homerent 4.120033e-01 5.614812e-01 0.73377929 4401.8470
8 Time -2.817114e-04 3.477564e-03 -0.08100826 4191.6466
9 Age -1.088836e-02 4.994322e-03 -2.18014705 4133.3853
10 Maritalmarried 6.047650e-01 4.180067e-01 1.44678300 4197.5359
11 Maritalseparated -6.783140e-01 4.624775e-01 -1.46669624 4221.1376
12 Maritalsingle 1.599798e-01 4.236380e-01 0.37763332 4183.9203
13 Maritalwidow 1.680857e-01 5.277448e-01 0.31849802 4336.1022
14 Recordsyes -1.783863e+00 1.020700e-01 -17.47686040 4250.4947
15 Jobfreelance -7.627158e-01 1.017639e-01 -7.49495736 4251.6598
16 Jobothers -7.035646e-01 2.018017e-01 -3.48641453 4404.0341
17 Jobpartime -1.475815e+00 1.258215e-01 -11.72942735 4397.6722
18 Expenses -1.508892e-02 2.636234e-03 -5.72366383 3134.7955
19 Income 7.021486e-03 7.549578e-04 9.30050160 208.2641
20 Assets 2.041218e-05 6.455763e-06 3.16185408 363.8802
21 Debt -1.607499e-04 3.604554e-05 -4.45963310 335.9344
22 Amount -1.934283e-03 1.717728e-04 -11.26070778 3999.3868
23 Price 8.747658e-04 1.263752e-04 6.92197277 4173.7383
p.value
1 1.727575e-01
2 1.835445e-28
3 9.140354e-01
4 3.891287e-02
5 9.964965e-02
6 4.571005e-01
7 4.631223e-01
8 9.354393e-01
9 2.930279e-02
10 1.480324e-01
11 1.425332e-01
12 7.057222e-01
13 7.501225e-01
14 4.190340e-66
15 8.023279e-14
16 4.943215e-04
17 2.624321e-31
18 1.140839e-08
19 1.968912e-17
20 1.699297e-03
21 1.121017e-05
22 5.574744e-29
23 5.134169e-12
We now have a full, complete dataset that we can analyze!
In conclusion, missing data can occur in research for a variety of reasons. It is never a good idea to ignore it. Doing this will lead to biased estimates of parameters, loss of information, decreased statistical power, and weak reliability of findings (Dong and Peng 2013). The best course of action is to impute the missing data by using multiple imputation. When missing data is discovered, it is important to first identify it and look for missing data patterns. Next, define the variables in the dataset that are related to the missing values that will be used for imputation. Create the necessary number of complete data sets. Run the models and combine them using the imputed values, and finally, analyze the complete dataset. Performing these steps will minimize the adverse effects caused by missing data on the anaylsis (Pampka, Hutcheson, and Williams 2016).